{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第10篇：分类数据处理\n",
    "分类数据(categorical data)是按照现象的某种属性对其进行分类或分组而得到的反映事物类型的数据，又称定类数据。例如，按照性别将人口分为男、女两类；按照经济性质将企业分为国有、集体、私营、其他经济等。“男”、“女”，“国有”、“集体”、“私营”和“其他经济”就是分类数据。\n",
    "\n",
    "为了便于计算机处理，通常用数字代码来表述各个类别，比如，用1表示“男性”，0表示“女性”，但是1和0等只是数据的代码，它们之间没有数量上的关系和差异。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第一部分：分类数据创建\n",
    "分类数据（Categorical data）具有较高的理解和执行效率，可以通过多种方式创建 Series 或者 DataFrame 中的列。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series 创建\n",
    "创建分类数据：这里以血型为例，假定每个用户有以下的血型，我们如何创建一个关于血型的分类对象呢？  \n",
    "使用 dtype=\"category\" 来指定数据类型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "Name: blood_type, dtype: category\n",
       "Categories (4, object): ['A', 'AB', 'B', 'O']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=[\"A\", \"AB\", np.nan, \"AB\", \"O\", \"B\"], name=\"blood_type\", dtype=\"category\")\n",
    "blood"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ".astype('category') 可以转换为分类数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "Name: blood_type, dtype: category\n",
       "Categories (4, object): ['A', 'AB', 'B', 'O']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=[\"A\", \"AB\", np.nan, \"AB\", \"O\", \"B\"], name=\"blood_type\")\n",
    "blood.astype('category')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一些特殊的方法下，会自动创建分类数据类型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     20岁及以下\n",
       "1    21岁到30岁\n",
       "2    31岁到40岁\n",
       "3     20岁及以下\n",
       "4        NaN\n",
       "5    21岁到30岁\n",
       "6    31岁到40岁\n",
       "7    21岁到30岁\n",
       "Name: age, dtype: category\n",
       "Categories (4, object): ['20岁及以下' < '21岁到30岁' < '31岁到40岁' < '41岁以上']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bins = [0,20,30,40,100]\n",
    "labels = ['20岁及以下','21岁到30岁','31岁到40岁','41岁以上']\n",
    "age = pd.Series([18, 30, 35, 18, np.nan, 30, 37, 25], name='age')\n",
    "pd.cut(age,bins=bins, labels=labels) #可选label添加自定义标签"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### pd.Categorical\n",
    "pandas.Categorical 可以创建一个类型数据序列到 DataFrame 中："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['A', 'AB', NaN, 'AB', 'O', 'B']\n",
       "Categories (4, object): ['A', 'AB', 'B', 'O']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Categorical([\"A\", \"AB\", np.nan, \"AB\", \"O\", \"B\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['A', 'AB', NaN, 'AB', NaN, 'B']\n",
       "Categories (3, object): ['A', 'B', 'AB']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#定制分类数据所有可能的取值\n",
    "pd.Categorical([\"A\", \"AB\", np.nan, \"AB\", \"O\", \"B\"], categories=[\"A\", \"B\", \"AB\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrame 创建\n",
    "与将单个列转换为分类相似，可以在构造期间或构造之后将 DataFrame 中的所有列批量转换为分类的类别。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "blood    category\n",
       "sex      category\n",
       "dtype: object"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"blood\": [\"A\", \"AB\",\"AB\", \"O\", \"B\"],\n",
    "    'sex': ['男', '男', '女', '男', '男']\n",
    "}\n",
    "user_info = pd.DataFrame(data, dtype=\"category\")\n",
    "user_info.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "也可以用 df.astype('category') 进行转换。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CategoricalDtype\n",
    "CategoricalDtype 是 pandas 的类型数据对象，它可以传入以下参数：\n",
    "\n",
    "categories: 没有缺失值的不重复序列  \n",
    "ordered: 布尔值，顺序的控制，默认是没有顺序的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.CategoricalDtype(['a', 'b', 'c'])\n",
    "# CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CategoricalDtype(categories=['a', 'b', 'c'], ordered=True)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.CategoricalDtype(['a', 'b', 'c'], ordered=True)\n",
    "# CategoricalDtype(categories=['a', 'b', 'c'], ordered=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CategoricalDtype(categories=None, ordered=False)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.CategoricalDtype()\n",
    "# CategoricalDtype(categories=None, ordered=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CategoricalDtype 可以在 Pandas 指定 dtype 的任何地方，例如pandas.read_csv()，pandas.DataFrame.astype() 或 Series 构造函数中。\n",
    "\n",
    "为方便起见，当您希望类别的默认行为是无序的并且等于数组中的设置值时，可以使用字符串 'category' 代替 CategoricalDtype。 换句话说，dtype='category' 等效于 dtype = CategoricalDtype()。\n",
    "\n",
    "只要 CategoricalDtype 的两个实例具有相同的类别和顺序，它们的比较就相等。 比较两个无序分类时，不考虑类别的顺序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 控制行为\n",
    "以上的例子中我们使用 dtype='category' 指定了分类数据类型，其中：\n",
    "- 具体分类是从数据中推断出来的  \n",
    "- 具体分类数据是没有顺序的  \n",
    "\n",
    "我们也可以使用 CategoricalDtype 实例来定义分类数据，同时还可以给定顺序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    NaN\n",
       "1      b\n",
       "2      c\n",
       "3    NaN\n",
       "dtype: category\n",
       "Categories (3, object): ['b' < 'c' < 'd']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([\"a\", \"b\", \"c\", \"a\"])\n",
    "cat_type = pd.CategoricalDtype(categories=[\"b\", \"c\", \"d\"],ordered=True)\n",
    "s_cat = s.astype(cat_type)\n",
    "s_cat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同样，CategoricalDtype可与DataFrame一起使用，以确保类别在所有列之间是一致的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "1    b\n",
       "2    c\n",
       "3    a\n",
       "Name: A, dtype: category\n",
       "Categories (4, object): ['a' < 'b' < 'c' < 'd']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')})\n",
    "cat_type = pd.CategoricalDtype(categories=list('abcd'),\n",
    "                            ordered=True)\n",
    "df_cat = df.astype(cat_type)\n",
    "df_cat['A']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以抽取分类数据:\n",
    "\n",
    "categories = pd.unique(df.to_numpy().ravel())  \n",
    "如果已经有了 code 和类别，则可以使用 from_codes() 构造函数在常规构造函数模式下保存分解步骤:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     test\n",
       "1     test\n",
       "2    train\n",
       "3    train\n",
       "4     test\n",
       "dtype: category\n",
       "Categories (2, object): ['train', 'test']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "splitter = np.random.choice([0, 1], 5, p=[0.5, 0.5])\n",
    "s = pd.Series(pd.Categorical.from_codes(splitter,categories=[\"train\", \"test\"]))\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 重新获取原始数据\n",
    "若要返回原始 Series 或 NumPy 数组，请使用 Series.astype(original_dtype) 或np.asarray(original_dtype)："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['A', 'AB', 'B', 'O']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=[\"A\", \"AB\", np.nan, \"AB\", \"O\", \"B\"], dtype=\"category\")\n",
    "blood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: object"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.astype(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['A', 'AB', nan, 'AB', 'O', 'B'], dtype=object)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.asarray(blood)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二部分：分类数据使用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分类数据结构\n",
    "一个分类变量包括三个部分，元素值（values）、分类类别（categories）、是否有序（order）。 \n",
    "从上面可以看出，使用cat函数创建的分类变量默认为有序分类变量\n",
    "下面介绍如何获取或修改这些属性"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们来看一下cat还有什么其它的属性和方法可以使用。下面cat的这些属性基本都是关于查看和操作Category数据类型的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['add_categories',\n",
       " 'as_ordered',\n",
       " 'as_unordered',\n",
       " 'categories',\n",
       " 'codes',\n",
       " 'ordered',\n",
       " 'remove_categories',\n",
       " 'remove_unused_categories',\n",
       " 'rename_categories',\n",
       " 'reorder_categories',\n",
       " 'set_categories']"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[i for i in dir(blood.cat) if not i.startswith('_')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### describe方法\n",
    "该方法描述了一个分类序列的情况，包括非缺失值个数、元素值类别数（不是分类类别数）、最多次出现的元素及其频数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['A', 'AB', 'B', 'O']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count      5\n",
       "unique     4\n",
       "top       AB\n",
       "freq       2\n",
       "dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### categories属性\n",
    "查看分类类别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['A', 'AB', 'B', 'O'], dtype='object')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.categories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ordered属性\n",
    "查看是否排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.ordered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分类数据修改"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### set_categories\n",
    "修改分类，但本身值不会变化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1    NaN\n",
       "2    NaN\n",
       "3    NaN\n",
       "4      O\n",
       "5    NaN\n",
       "dtype: category\n",
       "Categories (3, object): ['RH', 'A', 'O']"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.set_categories(['RH', 'A', 'O'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "修改分类，但本身值不会变化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### rename_categories\n",
    "需要注意的是该方法会把值和分类同时修改"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     new_A\n",
       "1    new_AB\n",
       "2       NaN\n",
       "3    new_AB\n",
       "4     new_O\n",
       "5     new_B\n",
       "dtype: category\n",
       "Categories (4, object): ['new_A', 'new_AB', 'new_B', 'new_O']"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.rename_categories(['new_%s'%i for i in blood.cat.categories])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "利用字典修改值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      a\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      b\n",
       "dtype: category\n",
       "Categories (4, object): ['a', 'AB', 'b', 'O']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.rename_categories({\"A\": 'a', \"B\": 'b'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分类数据添加"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用add_categories添加"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['A', 'B', 'AB', 'O']"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=pd.Categorical([\"A\", \"AB\", \"RH\", \"AB\", \"O\", \"B\"], categories=['A', 'B', 'AB', 'O']))\n",
    "blood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (5, object): ['A', 'B', 'AB', 'O', 'RH']"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.add_categories(['RH'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分类数据删除"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### remove_categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    NaN\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (3, object): ['B', 'AB', 'O']"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.remove_categories(['A'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 删除元素值未出现的分类类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    AB\n",
       "1    AB\n",
       "2     O\n",
       "3     B\n",
       "dtype: category\n",
       "Categories (4, object): ['A', 'B', 'AB', 'O']"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=pd.Categorical([\"AB\", \"AB\", \"O\", \"B\"], categories=['A', 'B', 'AB', 'O']))\n",
    "blood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    AB\n",
       "1    AB\n",
       "2     O\n",
       "3     B\n",
       "dtype: category\n",
       "Categories (3, object): ['B', 'AB', 'O']"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.remove_unused_categories()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 顺序\n",
    "新的分类数据不会自动排序。 您必须显式传递 ordered=True 来指示有序的分类。\n",
    "\n",
    "查看分类数据的顺序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['A', 'B', 'AB', 'O'], dtype='object')"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood = pd.Series(data=pd.Categorical([\"A\", \"AB\", \"RH\", \"AB\", \"O\", \"B\"], categories=['A', 'B', 'AB', 'O']))\n",
    "blood.cat.categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.ordered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### as_ordered\n",
    "一般来说会将一个序列转为有序变量，可以利用as_ordered方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['A' < 'B' < 'AB' < 'O']"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.as_ordered()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.as_ordered().cat.ordered"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['A', 'B', 'AB', 'O']"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.as_ordered().cat.as_unordered()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### set_categories\n",
    "利用set_categories方法中的order参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1    NaN\n",
       "2    NaN\n",
       "3    NaN\n",
       "4      O\n",
       "5    NaN\n",
       "dtype: category\n",
       "Categories (3, object): ['RH' < 'A' < 'O']"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.set_categories(['RH', 'A', 'O'], ordered=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### reorder_categories\n",
    "这个方法的特点在于，新设置的分类必须与原分类为同一集合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      A\n",
       "1     AB\n",
       "2    NaN\n",
       "3     AB\n",
       "4      O\n",
       "5      B\n",
       "dtype: category\n",
       "Categories (4, object): ['AB' < 'A' < 'B' < 'O']"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood.cat.reorder_categories(['AB', 'A', 'B', 'O'], ordered=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     good\n",
       "1     fair\n",
       "2     good\n",
       "3    awful\n",
       "4      bad\n",
       "dtype: category\n",
       "Categories (5, object): ['awful' < 'bad' < 'fair' < 'good' < 'perfect']"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(np.random.choice(['perfect','good','fair','bad','awful'],50)).astype('category')\n",
    "s.cat.set_categories(['perfect','good','fair','bad','awful'][::-1],ordered=True).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "49    perfect\n",
       "36    perfect\n",
       "15    perfect\n",
       "27    perfect\n",
       "28    perfect\n",
       "dtype: category\n",
       "Categories (5, object): ['awful', 'bad', 'fair', 'good', 'perfect']"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sort_values(ascending=False).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>good</th>\n",
       "      <td>-0.881925</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fair</th>\n",
       "      <td>-0.798461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>good</th>\n",
       "      <td>-1.335855</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>0.985629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bad</th>\n",
       "      <td>1.423465</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          value\n",
       "cat            \n",
       "good  -0.881925\n",
       "fair  -0.798461\n",
       "good  -1.335855\n",
       "awful  0.985629\n",
       "bad    1.423465"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_sort = pd.DataFrame({'cat':s.values,'value':np.random.randn(50)}).set_index('cat')\n",
    "df_sort.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>0.257634</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>0.012458</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>0.153295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>-0.093060</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>0.010395</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          value\n",
       "cat            \n",
       "awful  0.257634\n",
       "awful  0.012458\n",
       "awful  0.153295\n",
       "awful -0.093060\n",
       "awful  0.010395"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_sort.sort_index().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 对比\n",
    "在以下三种情况下，可以将分类数据与其他对象进行比较：\n",
    "\n",
    "- 相等性（==和!=）与长度与分类数据相同的类似列表的对象（列表，序列，数组等）进行比较  \n",
    "- 当 ordered == True 并且类别相同时，分类数据与另一个分类系列的所有比较（==，！=，>，> =，<和<=）\n",
    "- 分类数据与标量的所有比较。\n",
    "\n",
    "所有其他比较，特别是两个具有不同类别的分类或与任何类似列表的对象的分类的“非相等”比较，都会引发TypeError。\n",
    "\n",
    "将分类数据与具有不同类别或排序的 Series，np.array，列表或分类数据进行的任何“非相等”比较都会引发 TypeError，因为自定义类别的排序可以通过两种方式进行解释：一种考虑了排序，另一种没考虑。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 与标量或等长序列的比较"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2    False\n",
       "3     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([\"a\", \"d\", \"c\", \"a\"]).astype('category')\n",
    "s == 'a'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s == list('abcd')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 与另一分类变量的比较"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 等式判别（包含等号和不等号）  \n",
    "两个分类变量的等式判别需要满足分类完全相同"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "3    True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([\"a\", \"d\", \"c\", \"a\"]).astype('category')\n",
    "s == s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s != s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_new = s.cat.set_categories(['a','d','e'])\n",
    "#s == s_new #报错"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 不等式判别（包含>=,<=,<,>）\n",
    "两个分类变量的不等式判别需要满足两个条件：① 分类完全相同 ② 排序完全相同"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series([\"a\", \"d\", \"c\", \"a\"]).astype('category')\n",
    "#s >= s #报错"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "3    True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([\"a\", \"d\", \"c\", \"a\"]).astype('category').cat.reorder_categories(['a','c','d'],ordered=True)\n",
    "s >= s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三部分：为什么使用分类数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分类数据可以实现的功能事实上不用分类数据类型也可以实现，那么分类数据类型到底有什么特点呢？我们知道当数据量较大的时候，内存占用是一个很大的问题，虽然我们没有经常性的在内存中运行上g的数据，但是我们也总会遇到执行几行代码会等待很久的情况。使用Category数据的一个好处就是：可以很好的节省在时间和空间的消耗。下面我们通过几个实例来学习一下。使用分类数据类型的最重要的原因之一就是pandasCategorical数据实际上是一个表示类别的唯一（向下转换）整数的映射，其中整数本身占用（大概）比组成object数据类型的字符串少。Categorical 的内存使用量是与分类数乘以数据长度成正比，object 类型的数据是一个常数乘以数据的长度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 内存占用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    59\n",
       "1    59\n",
       "2    61\n",
       "3    59\n",
       "4    61\n",
       "5    53\n",
       "6    53\n",
       "7    59\n",
       "8    53\n",
       "9    53\n",
       "dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colors = pd.Series([\n",
    "    'periwinkle',\n",
    "    'mint green',\n",
    "    'burnt orange',\n",
    "    'periwinkle',\n",
    "    'burnt orange',\n",
    "    'rose',\n",
    "    'rose',\n",
    "    'mint green',\n",
    "    'rose',\n",
    "    'navy'\n",
    "])\n",
    "import sys\n",
    "colors.apply(sys.getsizeof)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面我们通过使用sys.getsizeof来显示内存占用的情况，数字代表字节数。\n",
    "\n",
    "还有另一种计算内容占用的方法：memory_usage()，后面会使用。\n",
    "\n",
    "现在我们将上面colors的不重复值映射为一组整数，然后再看一下占用的内存。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'periwinkle': 0, 'mint green': 1, 'burnt orange': 2, 'rose': 3, 'navy': 4}"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mapper = {v: k for k, v in enumerate(colors.unique())}\n",
    "mapper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    2\n",
       "3    0\n",
       "4    2\n",
       "5    3\n",
       "6    3\n",
       "7    1\n",
       "8    3\n",
       "9    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_int = colors.map(mapper)\n",
    "as_int"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    24\n",
       "1    28\n",
       "2    28\n",
       "3    24\n",
       "4    28\n",
       "5    28\n",
       "6    28\n",
       "7    28\n",
       "8    28\n",
       "9    28\n",
       "dtype: int64"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_int.apply(sys.getsizeof)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注：对于以上的整数值映射也可以使用更简单的pd.factorize()方法代替。\n",
    "我们发现上面所占用的内存是使用object类型时的一半。其实，这种情况就类似于Category data类型内部的原理。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分类变量的特点\n",
    "Categorical所占用的内存与Categorical分类的数量和数据的长度成正比，相反，object所占用的内存则是一个常数乘以数据的长度。 \n",
    "\n",
    "下面是object内存使用和category内存使用的情况对比。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "650"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colors.memory_usage(index=False, deep=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "507"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colors.astype('category').memory_usage(index=False, deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面结果是使用object和Category两种情况下内存的占用情况。我们发现效果并没有我们想象中的那么好。但是注意Category内存是成比例的，如果数据集的数据量很大，但不重复分类（unique）值很少的情况下，那么Category的内存占用可以节省达到10倍以上，比如下面数据量增大的情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20.0"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manycolors = colors.repeat(10)\n",
    "len(manycolors) / manycolors.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6500"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manycolors.memory_usage(index=False, deep=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "597"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manycolors.astype('category').memory_usage(index=False, deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，在数据量增加10倍以后，使用Category所占内容节省了10倍以上。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们再看一个例子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       AB\n",
       "1        O\n",
       "2       AB\n",
       "3        O\n",
       "4       AB\n",
       "        ..\n",
       "1995     O\n",
       "1996    AB\n",
       "1997     O\n",
       "1998    AB\n",
       "1999     O\n",
       "Length: 2000, dtype: object"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type = pd.Series([\"AB\",\"O\"]*1000)\n",
    "blood_type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "16000"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type.nbytes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2016"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type.astype(\"category\").nbytes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       AB   0\n",
       "1       AB   1\n",
       "2       AB   2\n",
       "3       AB   3\n",
       "4       AB   4\n",
       "         ...  \n",
       "1995    AB1995\n",
       "1996    AB1996\n",
       "1997    AB1997\n",
       "1998    AB1998\n",
       "1999    AB1999\n",
       "Length: 2000, dtype: object"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type = pd.Series(['AB%4d' % i for i in range(2000)])\n",
    "blood_type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "16000"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type.nbytes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20000"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blood_type.astype(\"category\").nbytes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 1.0, 'Memory usage of object vs. category dtype')"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA88ElEQVR4nO3dd3gU5fbA8e8hJARCbxIIEDoEEiCEhF4UpIgggoJiQUXEe9Wr/lRALNgRu9eC6BXsIlVEKTaU3hQSCB0ChBpaqIEk+/7+mElcwqYQsmyyez7Pkye70/a8s7tzZt6ZPSPGGJRSSvmuYp4OQCmllGdpIlBKKR+niUAppXycJgKllPJxmgiUUsrHaSJQSikfp4lAeSURKSkiP4hIsohMvcR5u4hIYg7jJ4jI05cfpboSRGSoiCz2dByFmSaCAiIiCSJyXkQqZxm+VkSMiIR6KDRfNRC4CqhkjLmpIBdsjBlhjHnhcpZR1DZOIjJZRF70dBwFwf4+1vd0HIWJJoKCtRO4JeOJiIQDJT0Xzj9EpLinY7jCagNbjDFpng5EXR4R8fN0DF7PGKN/BfAHJABPAauchr0OjAEMEGoPK2EP3w0cBCYAJe1xXYBE4AngELAfuAHoDWwBjgJPOi2/BPA2sM/+exsokWVZI4EDwBfAeuB6p/n9gcNACxftGQoszjLMAPXtx72BeOAksBd4zB5eAZgDJAHH7MchTsuoA/xpz/cL8D7wpdP4NsBS4DiwDuiSwzpvAiy0p90A9LWHPwecB1KBU8A9LubNy7p70l4/CcAQp3knAy86Pe8DrLXjWApEOI2rCcyw18cR4D077hQg3Y7vuIv4BgOrswx7BJid0/rPw+e0JPAGsAtIBhbzz+dvqv1ZSbbfo6b28OH2ujxvx/uDPbw6MN1u207goSyv85n9GdiI9ZlOzO29c1q/HwI/AaeBx7G+K8WdphkArM2mjZWA2cAJYCXwAvZn2W6XsZd7ChhEDt8LINSefrj9OdkP/J/TtMWAUcB2+/39Dqjo6e3RJW+/PB2At/zZG4tuwGb7Q+4H7MHaM3VOBG/bH9KKQBngB+AVe1wXIA14xv4w3mt/yb62p22KtQGpa0//PLAcqApUwdoIvZBlWa9ibfRK2l/GKU4x9wPismnPUHJOBPuBjvbjCkCk/biS/SUtZcc8FZjltIxlWIkwAOhgf1m/tMfVsL9Mve0vWHf7eRUX8fkD27A21gHA1VgbxUb2+LE4JRgX8+dl3b1pr7vOWBuOjGVPxk4EQCRW0o6x3/M77c9CCfv5OuAtIAgIBDpkt36zxFfKbk8Dp2GrgME5rf88fE7fx9oA17Dja8c/CfBu+z3LSJJrnebLbLP9vBiwBuuzGgDUBXYAPezx44A/7NhCgFjsRJCH924yVjJqb79OIFbS6+X0+jNx2iBnaeO3WBvkIKAZVqJc7DQ+83NsP8/2e8E/ieAbe3nhWN/Jbvb4h7E+RyH2evsI+MbT26NL3n55OgBv+eOfRPAU8ArQE/gZKG5/kEIBwdqg1HOary2w037cBTgL+NnPy9jzxjhNvwa4wX68HejtNK4HkOC0rPNAoNP46vYXrqz9fBrwRDbtGUrOiWA3cF/GsnJYLy2AY/bjWlgb2FJO47/kn0QwEvgiy/zzgTtdLLcj1t5rMadh3wBj7cdjyTkR5Lbu0oAgp/HfAU/bjyfzTyL4EDuBOE27GSt5tLU3GsVdvP5F69fFNF8Cz9iPG9jvXalLWf9ZllfM/nw1z8O05e33u1zWNtvPY4DdWeYZDUyyH2cmBfv5MP5JBLm9d5OBz7MseyTwlf24InAGCHYRtx/W0Utjp2Evk3MiyPZ7wT+JwHl544H/2Y83Atc4jQu2X/+i97ww/+k5goL3BXAr1hf98yzjqmDt6a0RkeMichyYZw/PcMQYk24/Pmv/P+g0/ixQ2n5cHesQP8Mue1iGJGNMSsYTY8w+YAkwQETKA72Ary6hbc4GYO257xKRP0SkLYCIlBKRj0Rkl4icwDoUL2/381YHjhpjzjgtZ4/T49rATRnrxl4/HbC+XFlVB/YYYxxOw3Zh7enmRW7r7pgx5nQO451j/r8sMde0p60J7DL5P0/xNf+cc7oV68gqY925XP+5qIy1d7096wgR8RORcSKy3X7fEpzmcaU2UD1Lu5/EOkEP9vvjNL3z47y8d87Tg5UUrxeR0sDNwCJjzH4XcVXB2vlynn+Xi+ky5fF7kXV5GZ+F2sBMp3WwEavL7yqKEE0EBcwYswurv7Q3Vt+ws8NYG/Kmxpjy9l85Y0zprMvJo31YH8QMtexhmeG4mOcz4DbgJmCZMWZvNss+jZW0ABCRas4jjTGrjDH9sLpWZmHtMQP8H9AI6yimLNApYxFY3RkVRaSU06JqOj3eg3VEUN7pL8gYM85FfPuAmiLi/BmuhdUNkBe5rbsKIhKUw3jnmF/KEnMpY8w39rha2Zyod/XeZLUAqCwiLbASwteZM2e//nNyGKtrsZ6LcbdidYl0A8ph7QmD9b65incP1pGsc7vLGGN62+P3Y3WXZHB+n/Py3l3wevbndBnQH7gda4fLlSSsoznn16uVzbTOcvteZF1exmdhD1aXlfN6CMzhe1UoaSJwj3uAq7PsUWLvAX0MvCUiVQFEpIaI9Mjn63wDPCUiVezLVp/B2nPKySysfu3/cPERi7N1QFMRaSEigVhdLdgxB4jIEBEpZ4xJxernzziKKYOV7I6LSEXg2Yz57CS5GhhrL6MtcL3Ta2bs9fWw91AD7Wv6nTcoGVZgJasnRMRfRLrYy/o2l/ZnyMu6e86OsyPWCWFXv0f4GBghIjFiCRKR60SkDNaJyv3AOHt4oIi0t+c7CISISEB2AdpHEtOA17C6Q36GXNd/tuzP36fAmyJS3V7HbUWkBNb7dg7rnEwprO4UZwexzgNkWAmcEJGR9m82/ESkmYi0tsd/B4wWkQoiUgN4wGne/L53n2P154djnSNw1cZ0rB2wsfbRaRjWeZuc2gK5fy+etpfXFLgLmGIPnwC8JCK1AezPU79c2lH4eLpvylv+sM8RuBieeY7Afh6I9SXbgfUF3oh9tQX21SrZzWsPWwzc5rSsd7E2Nvvtx4GulpUlpk+wvoilc2nTGKy9yD1Ye0sGqI91gm8e1hUhJ7BOYmacBK2OdTLyFNaVTvfZ8xW3x9cDFmH1yf4KTMTub7XHx2CdZDyKtXf3I1Arm/ia2tMmY51M7O80biw5nyPIdd05tX83cLvTvJO5sL+8p70OjtvLmgqUscfVwtrIHLGX9a49PMBu21HgcA5xdrTX3/tOw3Ja/7XsdZ/dOiuJdSJ4L/9cHVQSq7vxe/t92QXcwYXnhBrwz5VRs5ze62+w+vuPYZ00zTiJGoS1134c6zP+FLA9j+/dBevXaXgpu72f5fK5rYJ1tdpFVw3Z40fY79Nx4OacvhdcfNXQAZzOq2HtTD+KdV7oJFa328ue3h5d6p/YjVE+RESeARoaY24rBLFMATYZY57NdeJCQkQ+B7YZY573dCxFhYjcj3XFU+fLXM524D5jzC8FE9kFy77oe2H/EHQn4G+8+Dcp2jXkY+zumnuw9sQ98fqtRaSeiBQTkZ5Y/dKzPBFLftj9/Y2wNg4qGyISLCLt7fe5Eda5I5fdOZewzAFYe+e/FUSMWZbt0e+Fp2ki8CEici9WN89cY8yfHgqjGv90Hb0L3G+M+dtDseTHAawuhekejqOwC8C6pv4k1ob7e+CD/C5MRBZiXar7b3Ph1UaXrZB8LzxKu4aUUsrH6RGBUkr5uCJXiKxy5comNDTU02EopVSRsmbNmsPGmCquxhW5RBAaGsrq1as9HYZSShUpIpLtL6y1a0gppXycJgKllPJxmgiUUsrHFblzBK6kpqaSmJhISkpK7hOrQiUwMJCQkBD8/f09HYpSPssrEkFiYiJlypQhNDQUEcl9BlUoGGM4cuQIiYmJ1KlTx9PhKOWzvKJrKCUlhUqVKmkSKGJEhEqVKumRnFIe5hWJANAkUETp+6aU53lNIlBKKW+Vmu7gg4XbWLfnuFuWr4nAjRISEmjWrJnLccOGDSM+Pv6Sl7l27Vp++umnPE3bpUuXzB/fvfxy1vuMKKWKgvV7k7nh/SWMn7eZuesPuOU1NBF4yCeffEJYWNglz3cpicCZJgKlipaU1HRem7+Jfu8v4eCJc3w4JJJRvRq75bU0ERSQN998k2bNmtGsWTPefvvtzOFpaWnceeedREREMHDgQM6cse497ry3vmDBAtq2bUtkZCQ33XQTp06dAmDVqlW0a9eO5s2bEx0dTXJyMs888wxTpkyhRYsWTJky5YIYzp49y+DBg4mIiGDQoEGcPXsWgFGjRnH27FlatGjBkCFDePrpp3nnnXcy5xszZgzvvvsuCxcupFOnTvTv35+wsDBGjBiBw+HIMUalVMFbnXCU3u8u4v3ft3Njyxr8+mhneoUHu+31ilwZ6qioKJO11tDGjRtp0qQJAM/9sIH4fScK9DXDqpfl2eubZjt+zZo1DB06lOXLl2OMISYmhi+//JIKFSpQp04dFi9eTPv27bn77rsJCwvjscceo0uXLrz++uuEhoZy4403MnfuXIKCgnj11Vc5d+4co0aNonHjxkyZMoXWrVtz4sQJSpUqxZdffsnq1at57733LorjzTffZP369Xz66afExsYSGRnJ8uXLiYqKonTp0pkb74SEBG688Ub++usvHA4HDRo0YOXKlcTFxdGzZ0/i4+OpXbs2PXv25L777qNLly4uY3zmmWcKZP06v39K+bJT59J4bd4mPl++i+rlSvLKjeF0auiyTtwlE5E1xpgoV+O84ncEnrZ48WL69+9PUFAQADfeeCOLFi2ib9++1KxZk/btrfuV33bbbbz77rs89thjmfMuX76c+Pj4zGnOnz9P27Zt2bx5M8HBwbRubd0LvGzZsrnG8eeff/LQQw8BEBERQUREhMvpQkNDqVSpEn///TcHDx6kZcuWVKpUCYDo6Gjq1rXu633LLbewePFiAgMDXcaolCo4f2xJ4skZcexLPsudbUN5vEcjgkpcmU201yWCnPbc3SWno6qsl0dmfW6MoXv37nzzzTcXDI+Njc3XpZV5nWfYsGFMnjyZAwcOcPfdd+cYb3YxKqUu3/Ez53lhzkam/5VIvSpBTL2vLVGhFa9oDHqOoAB06tSJWbNmcebMGU6fPs3MmTPp2LEjALt372bZsmUAfPPNN3To0OGCedu0acOSJUvYtm0bAGfOnGHLli00btyYffv2sWrVKgBOnjxJWloaZcqU4eTJk9nG8dVXXwGwfv16YmNjM8f5+/uTmpqa+bx///7MmzePVatW0aNHj8zhK1euZOfOnTgcDqZMmUKHDh2yjVEpdXnmxu2n25t/MmvtXh7oWp8fH+p4xZMAaCIoEJGRkQwdOpTo6GhiYmIYNmwYLVu2BKBJkyZ89tlnREREcPToUe6///7M+USEKlWqMHnyZG655RYiIiJo06YNmzZtIiAggClTpvDggw/SvHlzunfvTkpKCl27diU+Pt7lyeL777+fU6dOERERwfjx44mOjs4cN3z4cCIiIhgyZAgAAQEBdO3alZtvvhk/P7/M6dq2bcuoUaNo1qwZderUoX///tnGqJTKn0MnUhjxxRru/+ovqpUrwewH2vNYj0YE+vvlPrMbeN3J4qIiPDyc2bNne6zGjsPhIDIykqlTp9KgQQMAFi5cyOuvv86cOXOuaCxF8f1TKj+MMUxdk8iLc+JJSXPwSLeG3NuxDsX93L9PrieLC5nu3bsTHh7usSQQHx9Pnz596N+/f2YSUEq5156jZ3hyZhyLth4mOrQi4waEU7dKaU+HBWgi8Iiff/7Zo68fFhbGjh07LhrepUsXunTpcuUDUsqLpTsMny9L4LX5mxHghX5NGRJTm2LFCk+dLU0ESinlJtsOnWTk9DjW7DpG54ZVePnGcGqUL+npsC6iiUAppQpYarqDj/7Yzru/bqNUCT/eGtScG1rUKLTVdjURKKVUAYpLTOaJ6bFs3H+C6yKCea5vUyqXLuHpsHKkiUAppQpASmo6b/+ylY8X7aBSUAAf3d6KHk2reTqsPNHfEXjAwoULWbp0qafDcGny5Mk88MADAMyaNStfpbKV8jUrdhyh1zuLmPDHdgZGhvDzo52LTBIANycCEekpIptFZJuIjHIxvouIJIvIWvuvYKqYFXJXIhEYYzIrh+aXJgKlcnYyJZWnZ61n0MTlpDkcfDUshlcHRlCupL+nQ7skbksEIuIHvA/0AsKAW0TEVQH+RcaYFvbf8+6Kx90+//xzIiIiaN68ObfffjsAP/zwAzExMbRs2ZJu3bpx8OBBEhISmDBhAm+99RYtWrRg0aJFJCUlMWDAAFq3bk3r1q1ZsmQJAElJSXTv3p3IyEjuu+8+ateuzeHDhwHXZa8TEhJo0qQJ//rXv4iMjOSFF17gkUceyYzx448/5tFHH70o9kmTJtGwYUM6d+6c+dpLly5l9uzZPP7447Ro0YLt27cTGRmZOc/WrVtp1aoVYBWxGzlyJNHR0URHR2eWosiuXUp5g983H6LHW3/y5Ypd3N2+DvMf7kT7+pU9HVa+uPMcQTSwzRizA0BEvgX6Ae7dxZw7Cg7EFewyq4VDr3HZjt6wYQMvvfQSS5YsoXLlyhw9ehSADh06sHz5ckSETz75hPHjx/PGG28wYsQISpcunVmF9NZbb+WRRx6hQ4cO7N69mx49erBx40aee+45rr76akaPHs28efOYOHEiYJW9njRpEitWrMgse925c2cqVKjA5s2bmTRpEh988AGnT5/OLDfh7+/PpEmT+Oijjy6Iff/+/Tz77LOsWbOGcuXK0bVrV1q2bEm7du3o27cvffr0YeDAgQCUK1eOtWvX0qJFCyZNmsTQoUMzl1O2bFlWrlzJ559/zsMPP8ycOXP4z3/+47JdShVlx06f54U58cz4ey8NqpZm+v3tiKxVwdNhXRZ3JoIawB6n54lAjIvp2orIOmAf8JgxZkPWCURkODAcoFatWm4I9fL89ttvDBw4kMqVrb2BihWtolGJiYkMGjSI/fv3c/78+Wx/SfzLL79c0AVz4sQJTp48yeLFi5k5cyYAPXv2pEIF68OWU9nr2rVr06ZNGwCCgoK4+uqrmTNnDk2aNCE1NZXw8PALXnvFihV06dKFKlWsmueDBg3KtqDcsGHDmDRpEm+++SZTpkxh5cqVmeNuueWWzP8ZRyHZtatMmTK5rlOlChtjDD/G7efZ7zeQfDaVh65pwL+71qNEcc/UBypI7kwEri6YzVrY6C+gtjHmlIj0BmYBF9U8MMZMBCaCVWsox1fNYc/dXYwxLq8PfvDBB3n00Ufp27cvCxcuZOzYsS7ndzgcLFu2jJIlL/yhSXZ1oHKqD5WRHDIMGzaMl19+mcaNG3PXXXe5nCev1zYPGDAg8yilVatWmfcwyLqMjMfZtUupoubgiRSemrWen+MPEhFSji+HxdAkOPd7hBQV7jxZnAjUdHoegrXXn8kYc8IYc8p+/BPgLyJFrpPtmmuu4bvvvuPIkSMAmV1DycnJ1KhRA4DPPvssc/qspaSvvfbaC+44tnbtWsDqWvruu+8A61aRx44dA3Iue51VTEwMe/bs4euvv87ca886fuHChRw5coTU1FSmTp2abZyBgYH06NGD+++//6KkklEJdcqUKZk3rcmuXUoVFcYYpqzaTbc3/+DPLUk82bsxM+5v51VJANybCFYBDUSkjogEAIOB2c4TiEg1sXcfRSTajueIG2Nyi6ZNmzJmzBg6d+5M8+bNM0/Ijh07lptuuomOHTtmdhsBXH/99cycOTPzZPG7777L6tWriYiIICwsjAkTJgDw7LPPsmDBAiIjI5k7dy7BwcGUKVMmx7LXrtx88820b98+s2vJWXBwMGPHjqVt27Z069btghPCgwcP5rXXXqNly5Zs374dgCFDhiAiXHvttRcs59y5c8TExPDOO+/w1ltvAWTbLqWKgt1HzjDkkxWMnB5HWHBZ5j/cieGd6l2RSqFXnDHGbX9Ab2ALsB0YYw8bAYywHz8AbADWAcuBdrkts1WrViar+Pj4i4Z5g5SUFJOammqMMWbp0qWmefPm+VrOddddZ3755ZcCiem1114zTz311AXDateubZKSkvK9TG99/1TRlJbuMJ8s2mEaPzXXNH1mnvlyeYJJT3d4OqzLBqw22WxX3frLYmN19/yUZdgEp8fvARffhV0B1t3Nbr75ZhwOBwEBAXz88ceXNP/x48eJjo6mefPmXHPNNZcdT//+/dm+fTu//fbbZS9LqcJoy8GTPDEtlrV7jnN146q81L8ZweW8/xyXlpgoxBo0aMDff/+d7/nLly9foLeUzLiCKauEhIQCew2lPOF8moMPF27nvd+3UibQn3cGt6Bv8+qFtkhcQfOaRGCyuXJHFW6miN0hT3mfdXuOM3J6LJsOnKRv8+o8e30YlQp5kbiC5hWJIDAwkCNHjlCpUiVNBkWIMYYjR44QGBjo6VCUDzp7Pp23ftnCJ4t2ULVMIJ/cEUW3sKs8HZZHeEUiCAkJITExkaSkJE+Hoi5RYGAgISEhng5D+Zhl248wekYsCUfOcEt0LUb3bkzZwKJVH6ggeUUi8Pf399j9f5VSRceJlFTGzd3E1yt2U7tSKb6+N4Z29YrcT5cKnFckAqWUys2vGw8yZuZ6Dp1MYXinujzSrSElA4p+eYiCoIlAKeXVjpw6x3M/xDN73T4aXVWGCbe3okXN8p4Oq1DRRKCU8krGGGav28dzP8RzMiWVR7o15P4u9Qgo7oW/DL5MmgiUUl5nf/JZnpq5nl83HaJ5zfKMHxBBo2pa9TY7mgiUUl7D4TB8u2oPr/y0kVSHg6eua8Jd7evgV0wvK8+JJgKllFdIOHyaUTNiWb7jKO3qVeKVG8OpXSko9xmVJgKlVNGWlu7g0yU7eWPBFgL8ijHuxnAGta6pPy69BJoIlFJF1qYDJxg5LZZ1icl0a3IVL97QjGrl9Jfql0oTgVKqyDmXls77v2/ng9+3Ua6kP/+9pSV9IoL1KCCfNBEopYqUv3cfY+T0WLYcPEX/ljV4uk8YFYMCPB1WkaaJQClVJJw5n8YbC7bw6ZKdVCsbyKdDo7i6sW8WiStomgiUUoXe0m2HGTUjjt1Hz3Bbm1qM7NmYMj5cJK6gaSJQShVayWdTeeWnjXy7ag91KgcxZXgbYupW8nRYXkcTgVKqUFqw4QBPzVrP4VPnuK+zVSQu0F+LxLmDJgKlVKFy+NQ5xs7ewJzY/TSuVoZP7owiIqS8p8PyapoIlFKFgjGGWWv38twP8Zw5l87/dW/IiC718PfTInHupolAKeVx+46fZczMOH7fnETLWlaRuAZXaZG4K0UTgVLKYxwOw1crdzPup404DDx7fRh3tA3VInFXmCYCpZRH7Eg6xajpcaxMOEqH+pV55cZwalYs5emwfJImAqXUFZWW7uCTxTt56+ctlChejPEDI7ipVYiWh/AgTQRKqSsmft8Jnpi+jvV7T9Cj6VW80K8ZVctqkThP00SglHK7c2npvPfbNj5cuJ3ypfz5YEgkvZpV06OAQsKt12WJSE8R2Swi20RkVA7TtRaRdBEZ6M54lFJX3ppdR7nu3cX897dt9GtRg58f6UzvcK0UWpi47YhARPyA94HuQCKwSkRmG2PiXUz3KjDfXbEopa680+fSeG3+Zj5blkD1ciX57O5oOjes4umwlAvu7BqKBrYZY3YAiMi3QD8gPst0DwLTgdZujEUpdQUt2prE6BlxJB47y51ta/N4z8aULqE90YWVO9+ZGsAep+eJQIzzBCJSA+gPXE0OiUBEhgPDAWrVqlXggSqlCkbymVRe/DGeqWsSqVsliKkj2tI6tKKnw1K5cGcicNUBaLI8fxsYaYxJz6m/0BgzEZgIEBUVlXUZSqlCYN76Azz9/XqOnj7Pv7rU46FrGmiRuCLCnYkgEajp9DwE2JdlmijgWzsJVAZ6i0iaMWaWG+NSShWgQydTGDt7Az/FHSAsuCyThramWY1yng5LXQJ3JoJVQAMRqQPsBQYDtzpPYIypk/FYRCYDczQJKFU0GGOY/tdeXpgTz9nUdB7v0YjhnepqkbgiyG2JwBiTJiIPYF0N5Ad8aozZICIj7PET3PXaSin3Sjx2hidnrufPLUlE1a7AuAER1K9a2tNhqXxy62l8Y8xPwE9ZhrlMAMaYoe6MRSl1+RwOwxfLd/HqvE0APNe3Kbe3qU0xLRJXpOn1XEqpPNmedIqR02JZvesYnRpW4eX+zQipoEXivIEmAqVUjlLTHUz8cwfv/LqVkv5+vH5TcwZE1tBfBnsRTQRKqWyt35vME9Niid9/gt7h1RjbtylVy2iROG+jiUApdZGU1HTe+XUrE//cQcWgACbcFknPZsGeDku5iSYCpdQFViUcZeS0WHYcPs1NrUJ46rowypXy93RYyo00ESilADh1Lo3x8zbx+bJdhFQoyRf3RNOxgRaJ8wWaCJRS/LEliSdnxLEv+SxD24XyeI9GBGmROJ+h77RSPuz4mfM8PyeeGX/tpV6VIKaNaEur2lokztdoIlDKBxljmLv+AM98v57jZ1J5oGt9Hri6vhaJ81GaCJTyMYdOpPD09+uZv+EgzWqU5bO7o2laXYvE+TJNBEr5CGMMU9ck8uKceM6lORjVqzHDOtShuBaJ83maCJTyAXuOnmH0jDgWbztMdGhFxg0Ip24VLRKnLJoIlPJi6Q7D58sSGD9vM8UEXrihGUOia2mROHUBTQRKeamtB08ycnosf+0+TpdGVXipfzg1ypf0dFiqENJEoJSXSU13MGHhdv772zaCSvjx1qDm3NBCi8Sp7GkiUMqLxCUm8/i0dWw6cJI+EcGM7duUyqVLeDosVchpIlDKC6SkpvPWL1v4+M8dVC5dgom3t+LaptU8HZYqInJNBCLSFrgN6AgEA2eB9cCPwJfGmGS3RqiUytGKHUcYNSOOnYdPM7h1TUb3bkK5klokTuVdjolAROYC+4DvgZeAQ0Ag0BDoCnwvIm8aY2a7O1Cl1IVOpqTy6rxNfLl8NzUrluSrYTG0r1/Z02GpIii3I4LbjTGHsww7Bfxl/70hIvrJU+oK+33TIZ6cGceBEync06EO/3dtQ0oFaE+vyp8cPzkukgD2hv+IMcZkN41Syj2Onj7P8z9sYNbafTSoWprp97cjslYFT4elirjcuobaAOOAo8ALwBdAZaCYiNxhjJnn/hCVUsYY5sTuZ+zsDSSfTeU/1zTgX13rUaK4FolTly+3Y8n3gCeBcsBvQC9jzHIRaQx8A2giUMrNDp5IYczM9fyy8SARIeX46t4YGlcr6+mwlBfJLREUN8YsABCR540xywGMMZv0xylKuZcxhimr9vDSTxs5n+ZgTO8m3NU+VIvEqQKXWyJwOD0+m2WcKeBYlFK2XUdOM3pGHEu3HyGmTkVeHRBBaOUgT4elvFRuiaC5iJwABChpP8Z+HujWyJTyQekOw6QlO3l9wWaKFyvGy/3DGdy6phaJU26V21VDeiZKqStk84GTPDE9lnV7jnNN46q82L8ZweW0SJxyv9yuGsrx5qXGmKO5zN8TeAfwAz4xxozLMr4f1tVIDiANeNgYszgPcSvlNc6nOfhg4Tbe/30bZQL9eWdwC/o2r65F4tQVk1vX0GEgEWsjDVaXUAYD1M1uRhHxA94HutvLWCUis40x8U6T/QrMNsYYEYkAvgMaX1oTlCq61u05zhPTYtl88CT9WlTnmT5hVNIiceoKyy0R/BfoAizBulx0ccYPyfIgGthmjNkBICLfAv2AzERgjDnlNH0QegJa+Yiz59N58+fN/G/xTqqWCeSTO6LoFnaVp8NSPiq3cwT/Eev4tAtwO/BfEVkAfGiM2ZnLsmsAe5yeJwIxWScSkf7AK0BV4DpXCxKR4cBwgFq1auXyskoVbku3H2b0jDh2HTnDrTG1GNWrMWUDtUic8pxcL0g2lt+BJ4AJwF1Atzws21UH50V7/MaYmcaYxsANWOcLXMUw0RgTZYyJqlKlSh5eWqnC50RKKqNnxHHrxysA+PreGF7uH65JQHlcbieLg7C6cwYBVYAZQKQxZk9O89kSgZpOz0OwKpm6ZIz5U0TqiUhlrV+kvM0v8QcZMyuOpJPnGN6pLo90a0jJAL0oTxUOuZ0jOARsxTo/sA1rj761iLQGMMbMyGHeVUADEakD7AUGA7c6TyAi9YHt9sniSCAAOJKfhihVGB05dY7nfohn9rp9NK5Whom3R9G8ZnlPh6XUBXJLBFOxNv6NufhqHoN1hOCSMSZNRB4A5mNdPvqpMWaDiIywx08ABgB3iEgq1i+XB13CyWilCi1jDLPX7WPs7A2cOpfGI90acn+XegQU1/IQqvCRorbdjYqKMqtXr/Z0GEpla3/yWZ6auZ5fNx2iRc3yjB8YQcOryng6LOXjRGSNMSbK1bjczhHcBnxtjHFkM74eEKw/AlMKHA7DN6t288pPm0hzOHjquibc1b4OfloeQhVyuXUNVQL+FpE1wBogCavGUH2gM9YPzka5NUKlioCdh08zanosK3YepV29Soy7MYJalUp5Oiyl8iS33xG8IyLvAVcD7YEIrL78jVi3sdzt/hCVKrzS0h18umQnbyzYQkDxYrw6IJybo2pqeQhVpOR6k1NjTDrws/2nlLJt3H+CkdNjiU1MpnvYVbx4QzOuKqtFeVXRo3e7VuoSnUtL5/3ft/PB79soV9Kf925tyXXhwXoUoIosTQRKXYK/dh9j5LRYth46Rf+WNXimTxgVggI8HZZSl0UTgVJ5cOZ8Gq/P38KkpTupVjaQSUNb07VxVU+HpVSByFMiEJGrgJeB6saYXiISBrQ1xvzPrdEpVQgs2XaYUTNi2XP0LLe3qc0TPRtRRusDKS+S1yOCycAkYIz9fAswBdBEoLxW8tlUXv5xI1NW76FO5SCmDG9DTN1Kng5LqQKX10RQ2RjznYiMhszyEelujEspj1qw4QBPzVrPkdPnGdG5Hg93a0CgvxaJU94pr4ngtIhUwi4jLSJtgGS3RaWUhySdPMfYHzbwY+x+mgSX5X93tiY8pJynw1LKrfKaCP4PmA3UE5ElWCWpb3JbVEpdYcYYZv69l+fnxHPmXDqPXduQ+zrXw99Pi8Qp75enRGCMWSMinYFGWDec2WyMSXVrZEpdIXuPn2XMzDgWbk4ispZVJK5+VS0Sp3xHXq8a2g68ZpeOzhg2xxjTx22RKeVmDofhqxW7GDd3Ew4Dz14fxh1tQ7VInPI5ee0aSgW6ikgMcJ8x5jzWPYmVKpJ2JJ1i1PQ4ViYcpWODyrzcP5yaFbVInPJNeU0EZ4wxg0TkCWCRiNyMi/sPK1XYpaU7+HjRTt76ZQuBxYvx2sAIBrYK0fIQyqflNREIgDFmvF2Sej5Q0W1RKeUGG/YlM3J6LOv3nqBH06t4oV8zqmqROKXynAieyXhgjPlVRHoAd7onJKUKVkpqOv/9bSsT/thBhVIBfDgkkl7hwZ4OS6lCI7c7lDU2xmwC9to3l3c2x31hKVUw1uw6yhPTYtmedJoBkSE83acJ5UtpkTilnOV2RPAoMBx4w8U4g3XDGqUKndPn0nht/mY+W5ZA9XIl+ezuaDo3rOLpsJQqlHK7Q9lw+3/XKxOOUpfvzy1JjJ4Rx77ks9zRpjaP92xM6RJaaFep7OTWNdQa2GOMOWA/vwMYAOwCxhpjjro/RKXyJvlMKi/8GM+0NYnUrRLEd/e1pXWoXtOgVG5y2036COgGICKdgHHAg0ALYCIw0J3BKZVX89bv5+nvN3D09Hn+1aUeD12jReKUyqvcEoGf017/IGCiMWY6MF1E1ro1MqXy4NDJFJ79fgNz1x8gLLgsk4a2plkNLRKn1KXINRGISHFjTBpwDdaJ47zOq5TbGGOYtiaRF3/cyNnUdJ7o2Yh7O9bVInFK5UNuG/NvgD9E5DBwFlgEICL10TLUykP2HD3DkzPjWLT1MK1DKzBuQAT1qpT2dFhKFVm5XTX0koj8CgQDC4wxGWUlimGdK8iRiPQE3gH8gE+MMeOyjB8CjLSfngLuN8asu7QmKF/hcBg+X5bA+PmbEeD5fk25LaY2xbRInFKXJdfuHWPMchfDtuQ2n4j4Ae8D3YFEYJWIzDbGxDtNthPobIw5JiK9sE5Ax+Q1eOU7th06xajpsazedYxODavwcv9mhFTQInFKFQR39vNHA9uMMTsARORboB+QmQiMMUudpl8OhLgxHlUEpaY7mPjnDt75ZSslA/x446bm3BhZQ4vEKVWA3JkIagB7nJ4nkvPe/j3AXFcjRGQ49onqWrVqFVR8qpBbvzeZJ6bFEr//BL3Dq/Fc32ZUKVPC02Ep5XXcmQhc7bK5LF0tIl2xEkEHV+ONMROxuo2IiorS8tdeLiU1nXd+3crEP3dQMSiACbe1omezap4OSymv5c5EkAjUdHoeAuzLOpGIRACfAL2MMUfcGI8qAlYlHGXktFh2HD7NzVEhjOkdRrlS/p4OSymv5s5EsApoICJ1gL3AYOBW5wlEpBYwA7g9Lyeglfc6dS6N8fM28fmyXYRUKMmX98TQoUFlT4ellE9wWyIwxqSJyANYN7HxAz41xmwQkRH2+AlY9zmoBHxgn/xLM8ZEuSsmVTj9vvkQY2bEsf9ECne1D+WxaxsRpEXilLpi5J+fBhQNUVFRZvXq1Z4OQxWAY6fP88KceGb8vZf6VUvz6oAIWtWu4OmwlPJKIrImux1t3e1SV5wxhp/iDvDs7PUcP5PKg1fX54Gr61OiuBaJU8oTNBGoK+rQiRSemrWeBfEHCa9Rjs/vjiGsellPh6WUT9NEoK4IYwxTVyfywo/xnE9zMLpXY+7pUIfiWiROKY/TRKDcbs/RM4yeEcfibYeJrlORcTeGU1eLxClVaGgiUG6T7jB8tjSB1+Zvxq+Y8OINzbg1upYWiVOqkNFEoNxi68GTPDE9lr93H6dLoyq83D+c6uVLejospZQLmghUgTqf5mDCH9t577dtBJXw4+1BLejXoroWiVOqENNEoApMbOJxnpgWy6YDJ7m+eXWevT6MyqW1SJxShZ0mAnXZUlLTeevnLXy8aAdVypTg4zui6B52lafDUkrlkSYCdVmW7zjCqOmxJBw5wy3RNRnVqwnlSmqROKWKEk0EKl9OpqQybu4mvlqxm1oVS/H1sBja1dcicUoVRZoI1CX7bdNBxsxcz8ETKQzrUIdHr21IqQD9KClVVOm3V+XZ0dPnef6HDcxau4+GV5XmgyHtaFlLi8QpVdRpIlC5MsbwQ+x+xs7ewMmUVP5zTQP+3bU+AcW1PIRS3kATgcrRgWSrSNwvGw/SPKQcrw6MoXE1LRKnlDfRRKBcMsbw7ao9vPzjRlIdDsb0bsLdHergp+UhlPI6mgjURXYdOc2o6XEs23GENnUrMu7GCEIrB3k6LKWUm2giUJnSHYZJS3by+oLN+Bcrxsv9wxncuqYWiVPKy2kiUABsPmAViVu35zjXNK7Ki/2bEVxOi8Qp5Qs0Efi482kOPli4jfd/30aZQH/evaUl10cEa5E4pXyIJgIftnbPcUZOi2XzwZP0a1GdZ69vSsWgAE+HpZS6wjQR+KCz59N5Y8FmPl2yk6plAvnfnVFc00SLxCnlqzQR+Jil2w8zanocu4+e4daYWozq1ZiygVokTilfponAR5xISeWVnzbyzco91K5Uim/ubUPbepU8HZZSqhDQROADfok/yJhZcSSdPMd9nerycLeGlAzw83RYSqlCQhOBFzty6hxjf4jnh3X7aFytDB/fEUVESHlPh6WUKmQ0EXghYwzfr93Hcz9s4NS5NB7t3pARnetpkTillEtu3TKISE8R2Swi20RklIvxjUVkmYicE5HH3BmLr9h3/Cz3fLaah6espXalIH58qCMPXdNAk4BSKltuOyIQET/gfaA7kAisEpHZxph4p8mOAg8BN7grDl/hcBi+XrmbcXM3ke4wPN0njKHtQrVInFIqV+7sGooGthljdgCIyLdAPyAzERhjDgGHROQ6N8bh9XYePs2o6bGs2HmU9vUr8Ur/CGpVKuXpsJRSRYQ7E0ENYI/T80QgJj8LEpHhwHCAWrVqXX5kXiIt3cH/Fu/kzZ+3EFC8GOMHRHBTVIiWh1BKXRJ3JgJXWyOTnwUZYyYCEwGioqLytQxvE7/vBCOnxxK3N5nuYVfx4g3NuKpsoKfDUkoVQe5MBIlATafnIcA+N76eTziXls57v23jw4XbKV/Kn/dvjaR3eDU9ClBK5Zs7E8EqoIGI1AH2AoOBW934el5vza5jjJwey7ZDp7ixZQ2e7hNGBS0Sp5S6TG5LBMaYNBF5AJgP+AGfGmM2iMgIe/wEEakGrAbKAg4ReRgIM8accFdcRdGZ82m8Nn8zk5cmEFw2kEl3taZro6qeDksp5SXc+oMyY8xPwE9Zhk1wenwAq8tIZWPx1sOMmhFL4rGz3N6mNk/0bEQZLRKnlCpA+sviQir5bCov/RjPd6sTqVM5iO/ua0t0nYqeDksp5YU0ERRC8zcc4OlZ6zly+jz3d6nHf65pQKC/FolTSrmHJoJCJOnkOcbO3sCPcftpElyW/93ZmvCQcp4OSynl5TQRFALGGGb8tZfn58Rz9nw6j/doxPBOdfH30/pASin300TgYXuPn+XJGXH8sSWJyFrlGT8wgvpVy3g6LKWUD9FE4CEOh+HLFbt4de4mDDD2+jBub6tF4pRSV54mAg/YnnSKUdNjWZVwjI4NKvNy/3BqVtQicUopz9BEcAWlpjv4eNEO3v5lK4HFi/HawAgGttIicUopz9JEcIWs35vMyOmxbNh3gp5Nq/H8DU2pWkaLxCmlPE8TgZulpKbz39+2MuGPHVQoFcCHQyLpFR7s6bCUUiqTJgI3Wp1wlCemx7Ij6TQDIkN4uk8TypfSInFKqcJFE4EbnD5nFYn7bFkC1cuV5LO7o+ncsIqnw1JKKZc0ERSwP7Yk8eSMOPYln+XOtqE83qMRQSV0NSulCi/dQhWQ42fO88KcjUz/K5G6VYKYel9bokK1SJxSqvDTRFAA5sbt5+nvN3DszHn+3bUeD16tReKUUkWHJoLLcOhECs98v4F5Gw7QtHpZPru7NU2ra5E4pVTRookgH4wxTFuTyAtz4klJczCyZ2OGdayjReKUUkWSJoJLtOfoGZ6cGceirYdpHVqBcQMiqFeltKfDUkqpfNNEkEfpDsMXyxIYP38zArzQrylDYmpTTIvEKaWKOE0EebDt0ElGTo9jza5jdG5YhZf6NyOkghaJU0p5B00EOUhNd/DRH9t599dtlCrhx5s3N6d/yxpaJE4p5VU0EWRj/d5kHp8Wy8b9J7guPJixfZtSpUwJT4ellFIFThNBFimp6bz9y1Y+XrSDikEBTLitFT2bVfN0WEqpvDAGHOlg0rP8d1h/lzTOkWUap8cXzZ91eqf/lxxTDq/boDs07V/gq00TgZOVO48yanosOw6fZlBUTZ7s3YRypfw9HZbyFm7fUFzKxitjWD5iuuzXLYj2uorbARhPv8v5I35QzC/L/2IgxS4cVqm+W15eEwFwMiWV8fM288XyXYRUKMmX98TQoUFlT4d15Rjj2b0clxuoS914FcQG6lI3jJe4x1gkSd42UMX8rGEZz50fF8sybeY4/3wsO+u47JZd7OJlS7GLp79ir5vNuGJ+UAjOOfp8Ivh98yHGzIhj/4kU7m5fh8e616VU/Hcw/U+Im5r9jBVC4VjChcO6PuWBvascNox53SAbhztXsfu4/OJKDl/0PG6gihWH4iUucdkFsWHMYUPhkdctVig2Usr9fDYRHDt1jv9N/4E2297kx+KJVCiRDGuw/vK0gISLh/3+4j+PXX6pXH3hXG2gcvgyFw+4xGW7e0NxKa+bj/a63IPLWL5upJQqCG5NBCLSE3gH8AM+McaMyzJe7PG9gTPAUGPMX+6MyZw7ibwSQgXgMezICqpb8akka2+ymJaaUEoVHW5LBCLiB7wPdAcSgVUiMtsYE+80WS+ggf0XA3xo/3eLMzMepFTs5xeP6DwKmvSBEmWgfO2MBrgrDKWUKlTceUQQDWwzxuwAEJFvgX6AcyLoB3xujDHAchEpLyLBxpj9BR1M7G9TiLCTwI4qV1PrvmkUL+5X0C+jlFJFjjv7MGoAe5yeJ9rDLnUaRGS4iKwWkdVJSUn5CqZiaAR7/WuTeOtC6v57piYBpZSyufOIwFXfStbe+LxMgzFmIjARICoqKl89+iF1m8CY2PzMqpRSXs2dRwSJQE2n5yHAvnxMo5RSyo3cmQhWAQ1EpI6IBACDgdlZppkN3CGWNkCyO84PKKWUyp7buoaMMWki8gAwH+sizU+NMRtEZIQ9fgLwE9alo9uwLh+9y13xKKWUcs2tvyMwxvyEtbF3HjbB6bEB/u3OGJRSSuVMf/mklFI+ThOBUkr5OE0ESinl4zQRKKWUjxPrfG3RISJJwK58zl4ZOFyA4RQF2mbfoG32DZfT5trGmCquRhS5RHA5RGS1MSbK03FcSdpm36Bt9g3uarN2DSmllI/TRKCUUj7O1xLBRE8H4AHaZt+gbfYNbmmzT50jUEopdTFfOyJQSimVhSYCpZTycT6TCESkp4hsFpFtIjLK0/Hkl4jUFJHfRWSjiGwQkf/YwyuKyM8istX+X8FpntF2uzeLSA+n4a1EJM4e965I4b5Rs4j4icjfIjLHfu7VbbZv3TpNRDbZ73dbH2jzI/bner2IfCMigd7WZhH5VEQOich6p2EF1kYRKSEiU+zhK0QkNNegjDFe/4dVBns7UBcIANYBYZ6OK59tCQYi7cdlgC1AGDAeGGUPHwW8aj8Os9tbAqhjrwc/e9xKoC3WneLmAr083b5c2v4o8DUwx37u1W0GPgOG2Y8DgPLe3Gas29TuBEraz78Dhnpbm4FOQCSw3mlYgbUR+BcwwX48GJiSa0yeXilXaMW3BeY7PR8NjPZ0XAXUtu+B7sBmINgeFgxsdtVWrPtDtLWn2eQ0/BbgI0+3J4d2hgC/AlfzTyLw2jYDZe2NomQZ7s1tzriHeUWsEvlzgGu9sc1AaJZEUGBtzJjGflwc65fIklM8vtI1lPEBy5BoDyvS7EO+lsAK4Cpj393N/l/Vniy7ttewH2cdXli9DTwBOJyGeXOb6wJJwCS7O+wTEQnCi9tsjNkLvA7sBvZj3bFwAV7cZicF2cbMeYwxaUAyUCmnF/eVROCqf7BIXzcrIqWB6cDDxpgTOU3qYpjJYXihIyJ9gEPGmDV5ncXFsCLVZqw9uUjgQ2NMS+A0VpdBdop8m+1+8X5YXSDVgSARuS2nWVwMK1JtzoP8tPGS2+8riSARqOn0PATY56FYLpuI+GMlga+MMTPswQdFJNgeHwwcsodn1/ZE+3HW4YVRe6CviCQA3wJXi8iXeHebE4FEY8wK+/k0rMTgzW3uBuw0xiQZY1KBGUA7vLvNGQqyjZnziEhxoBxwNKcX95VEsApoICJ1RCQA6wTKbA/HlC/2lQH/AzYaY950GjUbuNN+fCfWuYOM4YPtKwnqAA2Alfbh50kRaWMv8w6neQoVY8xoY0yIMSYU6737zRhzG97d5gPAHhFpZA+6BojHi9uM1SXURkRK2bFeA2zEu9ucoSDb6LysgVjfl5yPiDx90uQKnpzpjXWFzXZgjKfjuYx2dMA6zIsF1tp/vbH6AH8Fttr/KzrNM8Zu92acrp4AooD19rj3yOWEUmH4A7rwz8lir24z0AJYbb/Xs4AKPtDm54BNdrxfYF0t41VtBr7BOgeSirX3fk9BthEIBKYC27CuLKqbW0xaYkIppXycr3QNKaWUyoYmAqWU8nGaCJRSysdpIlBKKR+niUAppXycJgLlVURkjF29MlZE1opIzCXOP0JE7rjMGEKdK0sWFBHpIiLtnJ5PFpGBBf06yvcU93QAShUUEWkL9MGqznpORCpjVe3M6/zFjTET3Bbg5esCnAKWejgO5WX0iEB5k2DgsDHmHIAx5rAxZh9k1m7/Q0TWiMh8p5/zLxSRl0XkD+A/IjJWRB6zx9UTkXn2PItEpLE9/Cax6uWvE5E/cwpIrHsovCYiq+yjlPvs4V3s186438BXTvXke9vDFtt15ufYBQZHAI/YRzod7ZfoJCJLRWSHHh2o/NJEoLzJAqCmiGwRkQ9EpDNk1mb6LzDQGNMK+BR4yWm+8saYzsaYN7IsbyLwoD3PY8AH9vBngB7GmOZA31xiugerimZroDVwr10qAKzKsQ9j1ZyvC7QXkUDgI6xfkHYAqgAYYxKACcBbxpgWxphF9jKCsX5t3gcYl+saUsoF7RpSXsMYc0pEWgEdga7AFLHuRrcaaAb8bO90+2H9xD/DlKzLsqu7tgOmyj83typh/18CTBaR77AKo+XkWiDCaW+9HFa9mPNYNWMS7ddbi1Wj/hSwwxiz057+G2B4DsufZYxxAPEiclUusSjlkiYC5VWMMenAQmChiMRhFd9aA2wwxrTNZrbTLoYVA44bY1q4eI0R9kno64C1ItLCGHMkm2UL1lHF/AsGinQBzjkNSsf6Pl7qLRWdl1FobseoihbtGlJeQ0QaiUgDp0EtgF1Yxbqq2CeTERF/EWma07KMdY+HnSJykz2PiEhz+3E9Y8wKY8wzWHd/qpnDouYD99vdU4hIQ7FuMJOdTUBd+ec+s4Ocxp3Euj2pUgVKjwiUNykN/FdEygNpWNUXhxtjzttdM++KSDmsz/3bwIZcljcE+FBEngL8se6FsA54zU44glUpcl0Oy/gEq8vnL/tkcBJwQ3YTG2POisi/gHkichiremSGH4BpItIPeDCX2JXKM60+qlQhIyKl7fMdArwPbDXGvOXpuJT30q4hpQqfe+2TxxuwTi5/5NlwlLfTIwKllPJxekSglFI+ThOBUkr5OE0ESinl4zQRKKWUj9NEoJRSPu7/AS78pmJpna/1AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import string\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "np.random.seed(444)\n",
    "\n",
    "# %matplotlib inline 可以在Ipython编译器里直接使用，功能是可以内嵌绘图，并且可以省略掉plt.show()这一步。\n",
    "%matplotlib inline\n",
    "\n",
    "def mem_usage(obj, index=False, total=True, deep=True):\n",
    "    \"\"\"Memory usage of pandas Series or DataFrame.\"\"\"\n",
    "    # Ported from https://www.dataquest.io/blog/pandas-big-data/\n",
    "    usg = obj.memory_usage(index=index, deep=deep)\n",
    "    if isinstance(obj, pd.DataFrame) and total:\n",
    "        usg = usg.sum()\n",
    "    # Bytes to megabytes\n",
    "    return usg / 1024 ** 2\n",
    "\n",
    "catgrs = tuple(string.printable)\n",
    "\n",
    "lengths = np.arange(1, 10001, dtype=np.uint16)\n",
    "sizes = []\n",
    "for length in lengths:\n",
    "    obj = pd.Series(np.random.choice(catgrs, size=length))\n",
    "    cat = obj.astype('category')\n",
    "    sizes.append((mem_usage(obj), mem_usage(cat)))\n",
    "sizes = np.array(sizes)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(sizes)\n",
    "ax.set_ylabel('Size (MB)')\n",
    "ax.set_xlabel('Series length')\n",
    "ax.legend(['object dtype', 'category dtype'])\n",
    "ax.set_title('Memory usage of object vs. category dtype')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，分类数据类型随着数据个数的增多内存使用量并不会发生大的变化，而object类型随着数据个数的增多成直线型增长趋势。\n",
    "\n",
    "除了占用内存节省外，另一个额外的好处是计算效率有了很大的提升。因为对于Category类型的Series，str字符的操作发生在.cat.categories的非重复值上，而并非原Series上的所有元素上。也就是说对于每个非重复值都只做一次操作，然后再向与非重复值同类的值映射过去。\n",
    "\n",
    "但是Category数据的使用不是很灵活。例如，插入一个之前没有的值，首先需要将这个值添加到.categories的容器中，然后再添加值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "ccolors = colors.astype('category')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "ccolors.iloc[5] = 'a new color'\n",
    "ValueError                                Traceback (most recent call last)\n",
    "<ipython-input-15-1766a795336d> in <module>\n",
    "----> 1 ccolors.iloc[5] = 'a new color'\n",
    "```\n",
    "ValueError: Cannot setitem on a Categorical with a new category, set the categories first\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "ccolors = ccolors.cat.add_categories(['a new color'])\n",
    "ccolors.iloc[5] = 'a new color' "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果你想设置值或重塑数据，而非进行新的运算操作，那么Category类型不是那么有用。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
